Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition
نویسندگان
چکیده
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features the LSTM model incorporates cluster training effective training. second bidirectional (BLSTM-MFCC-TL), enhances accuracy by employing a model. third autoencoder (BLSTM-MFCCAE-TL), an extract additional AE features, which are then concatenated fed into BLSTM results showed that performance of was superior model, method adding achieved best learning effect. Moreover, proposed exhibit faster computation times compared reference GMM-HMM Therefore, utilizing pre-trained autoencoders encoding obtaining can significantly enhance Additionally, it also offers time traditional methods.
منابع مشابه
Mel Frequency Cepstral Coefficients for Speaker Recognition Using Gaussian Mixture Model-Artificial Neural Network Model
Speaker Recognition (SP) is a topic of great significance in areas of intelligent and security. In Biometric SP using automated method of verifying or recognizing the identity of the person on the basis of some application, such as a finger print or face pattern and human voice. Many method have been proposed in the literature are focusing on front end processing such as PLP and LPC. In this pa...
متن کاملThe Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition
Speech recognition is of an important contribution in promoting new technologies in human computer interaction. Today, there is a growing need to employ speech technology in daily life and business activities. However, speech recognition is a challenging task that requires different stages before obtaining the desired output. Among automatic speech recognition (ASR) components is the feature ex...
متن کاملMel, linear, and antimel frequency cepstral coefficients in broad phonetic regions for telephone speaker recognition
We’ve examined the speaker discriminative power of mel-, antimeland linear-frequency cepstral coefficients (MFCCs, aMFCCs and LFCCs) in the nasal, vowel, and non-nasal consonant speech regions. Our inspiration came from the work of Lu and Dang in 2007, who showed that filterbank energies at some frequencies mainly outside the telephone bandwidth possess more speaker discriminative power due to ...
متن کاملGeneralized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition
The focus of a continuous speech recognition process is to match an input signal with a set of words or sentences according to some optimality criteria. The first step of this process is parameterization, whose major task is data reduction by converting the input signal into parameters while preserving virtually all of the speech signal information dealing with the text message. This contributi...
متن کاملMel Frequency Cepstral Coefficients for Music Modeling
We examine in some detail Mel Frequency Cepstral Coefficients (MFCCs) the dominant features used for speech recognition and investigate their applicability to modeling music. In particular, we examine two of the main assumptions of the process of forming MFCCs: the use of the Mel frequency scale to model the spectra; and the use of the Discrete Cosine Transform (DCT) to decorrelate the Mel-spec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2023
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app13127008